AITopics | hybrid rl

Hybrid Reinforcement Learning Breaks Sample Size Barriers In Linear MDPs

Neural Information Processing SystemsMar-22-2026, 16:05:41 GMT

Hybrid Reinforcement Learning (RL), where an agent learns from both an offline dataset and online explorations in an unknown environment, has garnered significant recent interest. A crucial question posed by Xie et al. (2022) is whether hybrid RL can improve upon the existing lower bounds established in purely offline and purely online RL without relying on the single-policy concentrability assumption. While Li et al. (2023) provided an affirmative answer to this question in the tabular PAC RL case, the question remains unsettled for both the regret-minimizing RL case and the non-tabular case. In this work, building upon recent advancements in offline RL and reward-agnostic exploration, we develop computationally efficient algorithms for both PAC and regret-minimizing RL with linear function approximation, without requiring concentrability on the entire state-action space. We demonstrate that these algorithms achieve sharper error or regret bounds that are no worse than, and can improve on, the optimal sample complexity in offline RL (the first algorithm, for PAC RL) and online RL (the second algorithm, for regret-minimizing RL) in linear Markov decision processes (MDPs), regardless of the quality of the behavior policy. To our knowledge, this work establishes the tightest theoretical guarantees currently available for hybrid RL in linear MDPs.

artificial intelligence, machine learning, reinforcement learning, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.30)

Add feedback

Hybrid Reinforcement Learning Breaks Sample Size Barriers in Linear MDPs Kevin Tan, Wei Fan, Y uting Wei Department of Statistics and Data Science The Wharton School, University of Pennsylvania

Neural Information Processing SystemsFeb-18-2026, 08:25:09 GMT

Hybrid Reinforcement Learning (RL), where an agent learns from both an offline dataset and online explorations in an unknown environment, has garnered significant recent interest. A crucial question posed by Xie et al. (2022b) is whether hybrid RL can improve upon the existing lower bounds established for purely of-fline or online RL without requiring that the behavior policy visit every state and action the optimal policy does. While Li et al. (2023b) provided an affirmative answer for tabular P AC RL, the question remains unsettled for both the regret-minimizing and non-tabular cases. In this work, building upon recent advancements in offline RL and reward-agnostic exploration, we develop computationally efficient algorithms for both P AC and regret-minimizing RL with linear function approximation, without requiring concentrability on the entire state-action space. We demonstrate that these algorithms achieve sharper error or regret bounds that are no worse than, and can improve on, the optimal sample complexity in offline RL (the first algorithm, for P AC RL) and online RL (the second algorithm, for regret-minimizing RL) in linear Markov decision processes (MDPs), regardless of the quality of the behavior policy. To our knowledge, this work establishes the tightest theoretical guarantees currently available for hybrid RL in linear MDPs.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania (0.40)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.92)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Hybrid Reinforcement Learning Breaks Sample Size Barriers in Linear MDPs Kevin Tan, Wei Fan, Y uting Wei Department of Statistics and Data Science The Wharton School, University of Pennsylvania

Neural Information Processing SystemsOct-10-2025, 18:25:27 GMT

Hybrid Reinforcement Learning (RL), where an agent learns from both an offline dataset and online explorations in an unknown environment, has garnered significant recent interest. A crucial question posed by Xie et al. (2022b) is whether hybrid RL can improve upon the existing lower bounds established for purely of-fline or online RL without requiring that the behavior policy visit every state and action the optimal policy does. While Li et al. (2023b) provided an affirmative answer for tabular P AC RL, the question remains unsettled for both the regret-minimizing and non-tabular cases. In this work, building upon recent advancements in offline RL and reward-agnostic exploration, we develop computationally efficient algorithms for both P AC and regret-minimizing RL with linear function approximation, without requiring concentrability on the entire state-action space. We demonstrate that these algorithms achieve sharper error or regret bounds that are no worse than, and can improve on, the optimal sample complexity in offline RL (the first algorithm, for P AC RL) and online RL (the second algorithm, for regret-minimizing RL) in linear Markov decision processes (MDPs), regardless of the quality of the behavior policy. To our knowledge, this work establishes the tightest theoretical guarantees currently available for hybrid RL in linear MDPs.

algorithm, hybrid rl, partition, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania (0.40)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.92)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Multi-Fidelity Hybrid Reinforcement Learning via Information Gain Maximization

Sifaou, Houssem, Simeone, Osvaldo

arXiv.org Artificial IntelligenceSep-19-2025

Optimizing a reinforcement learning (RL) policy typically requires extensive interactions with a high-fidelity simulator of the environment, which are often costly or impractical. Offline RL addresses this problem by allowing training from pre-collected data, but its effectiveness is strongly constrained by the size and quality of the dataset. Hybrid offline-online RL leverages both offline data and interactions with a single simulator of the environment. In many real-world scenarios, however, multiple simulators with varying levels of fidelity and computational cost are available. In this work, we study multi-fidelity hybrid RL for policy optimization under a fixed cost budget. We introduce multi-fidelity hybrid RL via information gain maximization (MF-HRL-IGM), a hybrid offline-online RL algorithm that implements fidelity selection based on information gain maximization through a bootstrapping approach. Theoretical analysis establishes the no-regret property of MF-HRL-IGM, while empirical evaluations demonstrate its superior performance compared to existing benchmarks.

machine learning, reinforcement learning, simulator, (14 more...)

arXiv.org Artificial Intelligence

2509.14848

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Hybrid Reinforcement Learning Breaks Sample Size Barriers In Linear MDPs

Neural Information Processing SystemsMay-27-2025, 18:57:36 GMT

Hybrid Reinforcement Learning (RL), where an agent learns from both an offline dataset and online explorations in an unknown environment, has garnered significant recent interest. A crucial question posed by Xie et al. (2022) is whether hybrid RL can improve upon the existing lower bounds established in purely offline and purely online RL without relying on the single-policy concentrability assumption. While Li et al. (2023) provided an affirmative answer to this question in the tabular PAC RL case, the question remains unsettled for both the regret-minimizing RL case and the non-tabular case. In this work, building upon recent advancements in offline RL and reward-agnostic exploration, we develop computationally efficient algorithms for both PAC and regret-minimizing RL with linear function approximation, without requiring concentrability on the entire state-action space. We demonstrate that these algorithms achieve sharper error or regret bounds that are no worse than, and can improve on, the optimal sample complexity in offline RL (the first algorithm, for PAC RL) and online RL (the second algorithm, for regret-minimizing RL) in linear Markov decision processes (MDPs), regardless of the quality of the behavior policy.

algorithm, learning break sample size barrier, linear mdp, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)

Add feedback

Hybrid Reinforcement Learning Breaks Sample Size Barriers in Linear MDPs

Tan, Kevin, Fan, Wei, Wei, Yuting

arXiv.org Artificial IntelligenceAug-8-2024

Hybrid Reinforcement Learning (RL), where an agent learns from both an offline dataset and online explorations in an unknown environment, has garnered significant recent interest. A crucial question posed by Xie et al. (2022) is whether hybrid RL can improve upon the existing lower bounds established in purely offline and purely online RL without relying on the single-policy concentrability assumption. While Li et al. (2023) provided an affirmative answer to this question in the tabular PAC RL case, the question remains unsettled for both the regret-minimizing RL case and the non-tabular case. In this work, building upon recent advancements in offline RL and reward-agnostic exploration, we develop computationally efficient algorithms for both PAC and regret-minimizing RL with linear function approximation, without single-policy concentrability. We demonstrate that these algorithms achieve sharper error or regret bounds that are no worse than, and can improve on, the optimal sample complexity in offline RL (the first algorithm, for PAC RL) and online RL (the second algorithm, for regret-minimizing RL) in linear Markov decision processes (MDPs), regardless of the quality of the behavior policy. To our knowledge, this work establishes the tightest theoretical guarantees currently available for hybrid RL in linear MDPs.

algorithm, partition, wagenmaker and pacchiano, (13 more...)

arXiv.org Artificial Intelligence

2408.04526

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Contextualized Hybrid Ensemble Q-learning: Learning Fast with Control Priors

Cramer, Emma, Frauenknecht, Bernd, Sabirov, Ramil, Trimpe, Sebastian

arXiv.org Artificial IntelligenceJul-1-2024

Combining Reinforcement Learning (RL) with a prior controller can yield the best out of two worlds: RL can solve complex nonlinear problems, while the control prior ensures safer exploration and speeds up training. Prior work largely blends both components with a fixed weight, neglecting that the RL agent's performance varies with the training progress and across regions in the state space. Therefore, we advocate for an adaptive strategy that dynamically adjusts the weighting based on the RL agent's current capabilities. We propose a new adaptive hybrid RL algorithm, Contextualized Hybrid Ensemble Q-learning (CHEQ). CHEQ combines three key ingredients: (i) a time-invariant formulation of the adaptive hybrid RL problem treating the adaptive weight as a context variable, (ii) a weight adaption mechanism based on the parametric uncertainty of a critic ensemble, and (iii) ensemble-based acceleration for data-efficient RL. Evaluating CHEQ on a car racing task reveals substantially stronger data efficiency, exploration safety, and transferability to unknown scenarios than state-of-the-art adaptive hybrid RL methods.

formulation, reinforcement learning, rl agent, (12 more...)

arXiv.org Artificial Intelligence

2406.19768

Country:

Europe > Germany (0.04)
Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)
Europe > Portugal > Braga > Braga (0.04)

Genre: Research Report (0.50)

Industry:

Automobiles & Trucks (0.93)
Government (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Managing Power Consumption and Performance of Computing Systems Using Reinforcement Learning

Tesauro, Gerald, Das, Rajarshi, Chan, Hoi, Kephart, Jeffrey, Levine, David, Rawson, Freeman, Lefurgy, Charles

Neural Information Processing SystemsDec-31-2008

Businesses want to save power without sacrificing performance. This paper presents a reinforcement learning approach to simultaneous online management of both performance and power consumption. We apply RL in a realistic laboratory testbed using a Blade cluster and dynamically varying HTTP workload running on a commercial web applications middleware platform. We embed a CPU frequency controller in the Blade servers' firmware, and we train policies for this controller using a multi-criteria reward signal depending on both application performance and CPU power consumption. Our testbed scenario posed a number of challenges to successful use of RL, including multiple disparate reward functions, limited decision sampling rates, and pathologies arising when using multiple sensor readings as state variables. We describe innovative practical solutions to these challenges, and demonstrate clear performance improvements over both hand-designed policies as well as obvious "cookbook" RL implementations.

power consumption, response time, server, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Industry: Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Managing Power Consumption and Performance of Computing Systems Using Reinforcement Learning

Tesauro, Gerald, Das, Rajarshi, Chan, Hoi, Kephart, Jeffrey, Levine, David, Rawson, Freeman, Lefurgy, Charles

Neural Information Processing SystemsDec-31-2008

Businesses want to save power without sacrificing performance. This paper presents a reinforcement learning approach to simultaneous online management of both performance and power consumption. We apply RL in a realistic laboratory testbed using a Blade cluster and dynamically varying HTTP workload running on a commercial web applications middleware platform. We embed a CPU frequency controller in the Blade servers' firmware, and we train policies for this controller using a multi-criteria reward signal depending on both application performance and CPU power consumption. Our testbed scenario posed a number of challenges to successful use of RL, including multiple disparate reward functions, limited decision sampling rates, and pathologies arising when using multiple sensor readings as state variables. We describe innovative practical solutions to these challenges, and demonstrate clear performance improvements over both hand-designed policies as well as obvious "cookbook" RL implementations.

power consumption, response time, server, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Industry: Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Filters

Collaborating Authors

hybrid rl

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Hybrid Reinforcement Learning Breaks Sample Size Barriers In Linear MDPs

Hybrid Reinforcement Learning Breaks Sample Size Barriers in Linear MDPs Kevin Tan, Wei Fan, Y uting Wei Department of Statistics and Data Science The Wharton School, University of Pennsylvania

Hybrid Reinforcement Learning Breaks Sample Size Barriers in Linear MDPs Kevin Tan, Wei Fan, Y uting Wei Department of Statistics and Data Science The Wharton School, University of Pennsylvania

Multi-Fidelity Hybrid Reinforcement Learning via Information Gain Maximization

Hybrid Reinforcement Learning Breaks Sample Size Barriers In Linear MDPs

Hybrid Reinforcement Learning Breaks Sample Size Barriers in Linear MDPs

Contextualized Hybrid Ensemble Q-learning: Learning Fast with Control Priors

Managing Power Consumption and Performance of Computing Systems Using Reinforcement Learning

Managing Power Consumption and Performance of Computing Systems Using Reinforcement Learning